Add logical range partitioning representation#22777
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
cc: @gabotechs @stuhood |
| return not_impl_err!( | ||
| "Physical plan does not support Range repartitioning" | ||
| ); |
There was a problem hiding this comment.
This is a TODO, right? Should it point at a ticket?
There was a problem hiding this comment.
yes good catch I can add the epic: #22395
There was a problem hiding this comment.
I created small issues for each not impl. They are not very great descriptions but for tracking for now
7f0949d to
d3907a8
Compare
There was a problem hiding this comment.
Just left some non blocking comments. Pretty straight forward PR! thanks @gene-bordegaray and @stuhood.
| // TODO: Support planning LogicalPlan::Repartition with | ||
| // range partitioning. | ||
| // Tracked by https://github.com/apache/datafusion/issues/22786 | ||
| return not_impl_err!( | ||
| "Physical plan does not support Range repartitioning" | ||
| ); | ||
| } |
There was a problem hiding this comment.
Don't we have everything we need already for creating this mapping?
LogicalPartitioning::Range(v) => {
let sort_exprs = create_physical_sort_exprs(
v.ordering(),
input_dfschema,
execution_props,
)?;
Partitioning::Range(RangePartitioning::new(
LexOrdering::new(sort_exprs).unwrap(),
v.split_points().to_vec(),
))| .map(ToString::to_string) | ||
| .collect::<Vec<_>>() | ||
| .join(", "); | ||
| let split_points = self |
There was a problem hiding this comment.
Using the .join() method from itertools::Itertools should allow you to avoid the Vec allocation:
| .map(ToString::to_string) | |
| .collect::<Vec<_>>() | |
| .join(", "); | |
| let split_points = self | |
| .map(ToString::to_string) | |
| .join(", "); |
| .split_points() | ||
| .iter() | ||
| .map(ToString::to_string) | ||
| .collect::<Vec<_>>() |
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn logical_range_repartition_is_not_supported() -> Result<()> { |
There was a problem hiding this comment.
It would be cool if we could make this actually pass in this PR
Which issue does this PR close?
Rationale for this change
Declared scan output partitioning should use logical partitioning metadata, not physical partitioning types. This adds logical range partitioning so range-partitioned sources can declare their layout at the logical layer.
What changes are included in this PR?
Partitioning::RangeandRangePartitioning.SplitPointand shared split-point validation todatafusion-common.Are these changes tested?
Yes. Unit tests added
Are there any user-facing changes?
Yes. This adds public logical range partitioning API. No breaking API changes.